Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Parallel optimization sampling clustering K-means algorithm for big data processing
ZHOU Runwu, LI Zhiyong, CHEN Shaomiao, CHEN Jing, LI Renfa
Journal of Computer Applications    2016, 36 (2): 311-315.   DOI: 10.11772/j.issn.1001-9081.2016.02.0311
Abstract613)      PDF (883KB)(1531)       Save
Focusing on the low accuracy and slow convergence of K-means clustering algorithm, an improved K-means algorithm based on optimization sample clustering named OSCK (Optimization Sampling Clustering K-means Algorithm) was proposed. Firstly, multiple samples were obtained from mass data by probability sampling. Secondly, based on Euclidean distance similarity principle of optimal clustering center, the results of sample clustering were modeled and evaluated, and the sub-optimal solution of sample clustering results was removed. Finally, the final k clustering centers were got by weighted integration evaluation of clustering results, and the final k clustering centers were used as cluster centers of big data set. Theoretical analysis and experimental results show that the proposed method for mass data analysis with respect to the comparison algorithm has better clustering accuracy, and has strong robustness and scalability.
Reference | Related Articles | Metrics